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Abstract — Air quality information is increasingly becoming a 
public health concern, since some of the aerosol particles pose 
harmful effects to peoples health. One widely available metric 
of aerosol abundance is the aerosol optical depth (AOD). The 
AOD is the integrated light extinction coefficient over a vertical 
atmospheric column of unit cross section, which represents the 
extent to which the aerosols in that vertical profile prevent 
the transmission of light by absorption or scattering. The 
comparison between the AOD measured from the ground-based 
Aerosol Robotic Network (AERONET) system and the satellite 
MODIS instruments at 550 nm shows that there is a bias 
between the two data products. We performed a comprehensive 
analysis exploring possible factors which may be contributing 
to the inter-instrumental bias between MODIS and AERONET. 
The analysis used several measured variables, including the 
MODIS AOD, as input in order to train a neural network in 
regression mode to predict the AERONET AOD values. This 
not only allowed us to obtain an estimate, but also allowed us 
to infer the optimal sets of variables that played an important 
role in the prediction. In addition, we applied machine learning 
to infer the global abundance of ground level PM2.5 from 
the AOD data and other ancillary satellite and meteorology 
products. This research is part of our goal to provide air quality 
information, which can also be useful for global epidemiology 
studies. 

I. Introduction 

Atmospheric aerosols are tiny particles (solid and liquid) 
suspended in the atmosphere. Some aerosols pose harmful 
effects to peoples health when inhaled. Moreover, atmo- 
spheric aerosols play an important role in understanding the 
global climate. 

The aerosol optical depth (AOD), or optical thickness, is 
defined as the integrated extinction coefficient over a vertical 
column of unit cross section. The Extinction coefficient is 
the fractional depletion of radiance per unit path length and 
represent how much aerosols prevent the transmission of 
light by absorption and scattering. 

In the past, much effort has been placed in observ- 
ing aerosol characteristics, such as AOD, from space and 
ground-based instruments. The Moderate Resolution Imag- 
ing Spectroradiometer (MODIS), onboard the Terra and 
Aqua satellites, retrieve AOD using dark target methods in 


bands at 550, 670, 870, 1240, 1630, and 2130 nm over 
the ocean, and at 470, 550, and 670 nm over land [1], 
[2]. A global system of ground-based sun and sky scan- 
ning sun photometers, called the Aerosol Robotic Network 
(AERONET), also measure the AOD at various wavelengths 
(at 340, 380, 440, 550, 675, 870, and 1020 nm) [3]. 
AERONET measurements are taken every 15 min during 
daylight, and its level 2 quality control measurements as- 
sure AOD observations are accurate to within 0.01 for 
wavelengths of 440 nm and higher. AOD measurements 
from MODIS are available globally, whereas AERONET 
measurements are available only for land locations, some 
of which are coastal sites. 

Ideally, the measurements of AOD from these two in- 
struments should match. However, biases do exist be- 
tween AERONET and MODIS measurements. In this study 
when we refer to the difference between the ground truth 
AERONET AOD observations at 550 nm and the remotely 
sensed MODIS AOD at 550 nm as the bias, i.e. 

Bias = AEORNET AOD at 550 nm - MODIS AOD at 550 
nm. The bias is higher for higher AOD values. Figure 1 
shows that a significant number of points do not fall close 
to the 1:1 line. Figure 2 shows that the magnitude of the 
bias is greater for larger values of AOD at 550 nm. Our 
goal is to try and understand the factors that can delineate 
these extrema, and /or explain them statistically. 

II. Previous Studies 

Previous MODIS aerosol validation studies compared the 
Aqua and Terra MODIS -retrieved AODs with the ground- 
based AERONET observations [4], [5], [2], [6]. 

From the studies of Normalized Difference Vegetation 
Index (NDVI), Brown et al. (2008) suggested that the surface 
type played a key role in explaining a significant fraction of 
the observed bias [7]. 

Lary et al. (2009) used machine-learning approaches to 
explore factors contributing to a persistent bias between 
AOD retrieved from MODIS and AERONET data [8]. Their 
work also suggested a link between the MODIS AOD bias 
and the surface type. The possible factors influencing the 
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Figure 1. A scatter diagram showing the comparison between the AOD 
from AERONET and MODIS instruments at 550 nm. The regime of high 
bias has been indicated by a circle. 



Figure 2. Scatter-histogram showing the distribution of bias and MODIS 
AOD measurements at 550 nm. 


bias might be associated with the measurement conditions 
such as the solar and sensor zenith angles, the solar and 
sensor azimuth, scattering angles, and surface reflectivity at 
the various measured wavelengths, etc. In their study they 
explained the AOD bias between MODIS and AERONET 
by using the surface type, the solar zenith angle, the solar 
azimuth angle, the sensor zenith angle, the sensor azimuth 
angle, the scattering angle, and the reflectance at 550 nm as 
input variables to the neural network. 

In this paper, we performed a comprehensive analysis for 
every possible combination of the variables as input to train 
the neural network in regression mode to predict the AOD 
values. We then compared how well the predictions matched 
with the observed AOD values. As a result we obtained 
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Figure 3. Supervised neural network technique. 


the best set of variables explaining the bias in the MODIS 
(AOD) measurements. 


III. Neural Network Regression Technique 


Neural networks (NN) are biologically inspired algorithms 
used for classification or function approximation [9], [10], 
[11]. NNs are widely used in pattern recognition, machine 
learning and artificial intelligence. In addition, NNs have 
found many applications in other fields such as geoscience, 
remote sensing, oceanography, etc. Neural networks are also 
referred to as a multi-layer perceptron method because they 
may consist of multiple layers (e.g., input, hidden and output 
layers). Each neuron is connected to all other neurons in 
the adjacent layers. Each of neuron is assigned weights for 
each interconnection with other neuron. The output of the 
k th neuron can be written as the weighted sum of inputs: 


Vk 





( 1 ) 


where cp is the transfer function, w^j represents the weight 
from unit j to unit k and Xj represents the m input variables 
to the neuron. During training the NN weights are adjusted 
appropriately to learn the data. The learning, and adjustments 
of the weights are inspired by the synaptic learning behavior 
of neurons. 

For an observation data set with n input variables, say 
{xi, X 2 , £ 3 , x n }, the observed output variable, AOD, is 
some function of these input variables. Our approach ap- 
proximates the function by non-parametric, non-linear NNs. 
We selected supervised NN method since a NN learns from 
its input parameters, and is free from assumptions about its 
inputs. This allows us to explore various sets of inputs. The 
goal here is to train the NN against the AERONET AOD 
data as the target as shown in the figure 3. The trained NN is 
then used to predict the AOD for the given set of variables. 

As we describe next, we applied a neural network regres- 
sion technique to learn the inter-instrumental bias and seek 
the best set of variables contributing to the bias. 


IV. Search for optimal set of variables for bias 

REDUCTION 

We observed AOD at 550 nm along with 14 other vari- 
ables that are listed below. For brevity, we have denoted the 
variables by the corresponding numbers in the tables that 
follow. 

1) Aerosol optical depth at 550 nm (AOD0550) 

2) Aerosol optical depth at 470 nm (AOD0470) 

3) Aerosol optical depth at 660 nm (AOD0660) 

4) Mean reflectance at 470 nm (mref0470) 

5) Mean reflectance at 550 nm (mref0550) 

6) Surface reflectance at 660 nm (surfre0660) 

7) Surface reflectance at 660 nm (surfre0470) 

8) Surface reflectance at 660 nm (surfre2100) 

9) Cloud fraction from land aerosol cloud mask (cfrac) 

10) Quality assurance (QAavg) 

11) Solar zenith angle (SolarZenith) 

12) Solar azimuth angle (Solar Azimuth) 

13) Viewing zenith angle (SensorZenith) 

14) Sensor azimuth angle (Sensor Azimuth) 

15) Scattering angle (ScatteringAngle) 

The number of combinations for n variables, considering a 
set of k at a time, is given by the combination n Ck . Our AOD 
data set contains 15 measured variables and, we consid- 
ered all the possibilities such as 15 Ci5, 15 C14, 15 C13, , 15 C2. 
Thus, there are 32,781 possible combinations to be explored. 
So, we have made this a search problem where the search 
is over the possible set of variables that can best fit the 
observed data. At end, the non-relevant variables will be 
absent from the best fitting set of variables. 

For each combination set, one at a time, we trained 
the NN with AERONET AOD as the target variable, and 
then predicted what this AERONET AOD is from the 
trained network. The NN algorithm used a feed-forward 
back propagation algorithm with a hidden layer having 
200 nodes as shown in figure 4. The training was done 
by the Levenberg-Marquardt algorithm with mean-squared 
error as the performance factor provided by the Matlab 
NN toolbox. When training a neural network, we randomly 
split the training data set into three portions, in the ratio of 
80 : 10 : 10. The first 80% portion is used to train the NN 
weights using an iterative process. For each iteration, we 
evaluate the current root mean square (RMS) error of the 
neural network by using the second 10% portion of the data 
(this portion was not used in the training). We use the RMS 
error and how it changes with the training iterations (known 
as epochs) to determine the convergence of our training. 
When the training is complete, we use the final 10% of the 
data as the validation data set. 

Since the neural network constructs a mapping between 
the set of input variables and the output variables. The most 
relevant set of variable is the one that can best reproduce the 
target data. We explored all combinations of variables, which 
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Figure 4. Matlab’s neural network toolbox was used to train the neural 
nets. A screenshot of 12 variable training case is shown. 


provided the fit of the observed AERONET AOD data. The 
end product is a regression between the available satellite 
variable, which are used to predict the observed AERONET 
AOD. This is a massive number crunching exercise. We 
automated the workflow for each combination by writing 
a job-parallel code. 

V. Similarity measure between Predicted and 
Observed AOD 

In order to quantify the agreement between the observed 
and predicted data, we used both the correlation coefficient 
appropriate for Gaussianly distributed variables and the 
mutual information appropriate for variables of arbitrary 
probability distribution. The predictions made by the most 
relevant set of variables show the highest correlation coeffi- 
cient or highest mutual information with the observed data. 
In the appendix, we show that Mutual Information (MI) is 
the more general case of correlation coefficient. When we 
assume the normal distribution, the expression of MI returns 
the correlation measure. So, it makes sense to use MI as 
the general measure of correlation between the observed 
and predicted set as many of the variables are not normally 
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Figure 5. Scatter diagram showing the comparison between the AOD from 
AERONET and the NN correction. The NNs are able to learn and address 
the bias correction. Similar bias correction from the application of NN has 
been extended to explore all the possible combination of variables. Table 1 
shows that the highest set of relevant variable consists of only 14 variables, 
as opposed to the complete set of 15 variables. 


distributed. 

In the literature, there are several methods to estimate MI 
from data [12], [13], [14]. We applied the variable bin width 
histogram approach [15], [16] to compute the normalized MI 
between the observed and predicted AOD. Higher values in- 
dicate better agreement between the observed and predicted 
set, and thus, are the best indicators of the input variables 
needed to assess a relevant set of variables. We compiled a 
table containing the MI for all sets in the decreasing order 
and are presented in Table I. 

We note that we can construct the best regression fit of the 
MODIS parameters to predict the AERONET AOD when 
certain MODIS parameters are absent in the combination. 
Table II shows the absent variables from the combination. 
These absent variables include Aerosol optical depth at 550 
nm (AOD0550), AOD at 660 nm, Cloud fraction from land 
aerosol cloud mask (cfrac), Surface reflectance at 470 nm 
(surfre0470), Surface reflectance at 660 nm (surfre0660), 
Sensor azimuth angle (Sensor Azimuth), Solar Zenith angle 
(Solar Zenith) etc. The neural networks performed better or 
could reproduce the observed AOD data in the absence of 
certain variables indicates that presence of the aforemen- 
tioned variables in the NN input attribute to the observed 
bias. 

Therefore, the methodology of comprehensive search pro- 
vides us insights into the factors explaining the bias between 
the MODIS AOD and AERONET AOD, and we also obtain 
the best performing NN which can then be used to estimate 
the bias corrected AOD observations [17]. Figure 5 shows 
the bias corrected AOD plot compared to the AERONET 
AOD. Clearly, the bias at the higher values of the AOD 


Table I 

Table showing the MI and Corr-Coeff values between 

OBSERVED AND PREDICTED AOD. THE TABLE IS ARRANGED IN 
DESCENDING ORDER OF MI VALUES FOR VARIOUS COMBINATION OF 
INPUT VARIABLES. 


Combination 

Mutual Information (MI) 

Corr-Coeff(r) 

2345 67 8 9 10 11 12 13 14 15 

0.771 

0.927 

1 245 67 8 10 11 12 13 15 

0.769 

0.926 

1 2345 68 10 11 12 13 14 15 

0.768 

0.926 

1 245 689 10 11 12 13 14 15 

0.766 

0.926 

1 2345 67 89 10 12 13 15 

0.765 

0.926 

1 2457 8 10 12 13 14 15 

0.764 

0.925 

1 2456789 10 11 12 13 14 

0.762 

0.921 

2 345 67 8 10 11 12 13 14 

0.761 

0.924 

1 345 67 8 10 11 12 13 14 15 

0.760 

0.924 

1 2457 8 10 11 12 13 14 15 

0.759 

0.925 

1 245 67 10 11 12 13 14 15 

0.759 

0.925 

1 345 67 89 10 11 12 13 15 

0.756 

0.924 

1 24568 10 11 12 13 15 

0.756 

0.921 

1 2457 8 10 11 12 13 15 

0.755 

0.923 

23457 89 10 11 12 13 15 

0.755 

0.924 

1 2345 67 89 10 11 12 15 

0.755 

0.921 

1 23457 89 10 11 12 13 14 15 

0.754 

0.875 

1 345679 10 11 12 13 15 

0.754 

0.924 

1 345 6 8 10 12 13 14 15 

0.753 

0.922 

2345 69 10 11 12 13 14 15 

0.753 

0.923 


Table II 

Absent variables in Table 1 


Row # 

Absent variables in the set as shown in the Table 1 

1 

AOD 0550 

2 

AOD0660, cfrac, Sensor Azimuth 

3 

surfrc0470, cfrac 

4 

AOD0660, surfrc0470 

5 

SolarZenith, SensorAzimuth 

6 

AOD0660, surfrc0660, cfrac, SolarZenith 

7 

AOD0660, Scattering Angle 

8 

AOD0550, cfrac, ScattcringAnglc 

9 

AOD0470, cfrac 

10 

AOD0660 , surfrc2100, cfrac 


have been corrected and the AOD values follow values close 
to 1 : 1 line. This best prediction was obtained by a set 
consisting of the following variables: AOD at 470 nm, and 
AOD at 660 nm, Mean reflectance at 470 nm, and Mean 
reflectance at 550 nm, Surface reflectance at 660nm, 470 
nm, 2100 nm, Cloud fraction, Quality assurance values, 
Solar zenith angle, solar azimuth angle, Zenith angle, Sensor 
azimuth angle and scattering angle. 

VI. Estimating global PM2.5 abundance 

We also applied machine learning to infer the global 
abundance of ground level distribution of particles with a 
diameter of 2.5 micrometers (PM 2.5) or less from the 
AOD data and other ancillary satellite and meteorology 





Figure 6. The yearly average PM 2.5 distribution for continental USA, 
generated from satellite data, weather analysis, and roadside GLP observa- 
tions. 

products. The abundance of PM2.5 at ground level is known 
to adversely impact public health. For example, it is known 
to have serious impacts on people with hearth diseases, 
asthma, or some cardiovascular diseases, etc [18]. 

Ground monitoring stations are only available at cer- 
tain locations, so we do not have in-situ observations of 
ground level PM2.5 (GLP) for the whole planet. It is not 
always possible to obtain the GLP in the rapidly increasing 
population area, which are also the newer locations of air 
pollution. However, if we can use remotely sensed data we 
can provide a daily GLP data product for the entire globe. By 
constructing an automated workflow with large computing 
facilities, it will be possible to examine the GLP for any 
location in the world and also to analyze the trend of changes 
in global air pollution. 

Figure 6 shows the GLP for the continental USA gener- 
ated from satellite data, weather analysis, and roadside GLP 
observations. Currently there is a full coverage of ground 
level PM2.5 in the US, and nearly full global coverage. 
The estimation of GLP has important health application. 
Since the GLP products can be used to construct a global 
air pollution map, one can construct an application for 
personal digital assistant (PDA) advising the adverse outdoor 
situation. We have implemented such an approach and this 
is work in progress. 

The global estimation with a continuous spatial and tem- 
poral coverage can be critically useful in making health care 
decisions. This will also help in making public policy deci- 
sions for improving the GLP and environmental conditions. 

VII. Conclusions and Future Works 

In this paper, we studied factors influencing the bias in the 
observed AOD values between the MODIS and AERONET 
instruments. We applied supervised neural network method 
in regression mode to train the NN with AERONET data set 
as the target, and recomputed the prediction of AOD from 
the neural nets. We performed an exhaustive search for the 
possible combinations of input variables. 


The best prediction of AOD, which had maximum mutual 
information with AERONET, was provided by the set con- 
sisting of the following variables: AOD at 470 nm, 660 nm, 
Mean reflectance at 470 nm, 550 nm, Surface reflectance at 
660nm, 470 nm, 2100 nm, Cloud fraction, Quality assurance 
values, Solar zenith angle, solar azimuth angle, Zenith angle, 
Sensor azimuth angle and scattering angle. The best agree- 
ment between the observed and predicted AOD occurred 
when some of the variables were missing from the input 
combinations. For example, the best set is missing AOD 
550nm itself in the set i.e the neural network performes best 
in the absence of MODIS AOD values at 550nm. Similarly, 
for various other combinations, the absence of one or couple 
of variables such as the AOD at 660 nm, the cloud fraction 
from the land aerosol cloud mask, the surface reflectance at 
470 nm and at 660 nm, the sensor azimuth angle, the solar 
zenith angle etc. seems to indicate that their presence can 
be attributed to the observed bias. 

The method of estimating the AOD was also applied 
to estimate the ground level PM2.5, which are known to 
have adverse health effects. In this case we used the AOD 
measurements and ancillary information as input and trained 
the NN in regression mode to estimate the GLP. The global 
estimation with a continuous spatial and temporal coverage 
can critically help in making public policy decisions for 
improving the GLP environmental conditions and healthcare 
decisions. 

Appendix 

Correlation coefficient (Pearsons correlation) is a widely 
used measure of dependence between two variables, and rep- 
resents the normalized measure of the strength of their linear 
relationships. The correlation coefficient px,Y between two 
random variables X and Y with expected values fix and 
fly and standard deviations <Jx and cry is defined as: 

cov(X, Y) E[{X - p x )(Y - p Y )] 

PX,Y = = (2) 

crxcry CFxCTY 

where, E is the expected value operator, cov means covari- 
ance, and, p a widely used alternative notation for Pearson’s 
correlation. 

The correlation coefficient is defined only if both of 
the standard deviations are finite and both of them are 
nonzero. The correlation coefficients range from -1 to 1. The 
correlation coefficient values close to 1, (or -1) suggest that 
there is a positive (or negative) linear relationship between 
the data columns, whereas the values close to or equal 
to 0 suggest there is no linear relationship between the 
data columns. It can only be applied to the cases of linear 
relationship between two variables. 

Mutual information quantifies the mutual dependence 
between two variables, by taking into account the whole 
probability distribution function (PDF) characteristics of the 
variables. Mutual information (MI) is defined as follows in 


discrete form: 


I{X,Y) = ^2p(x,y) log ^f v (3) 

which is a special case of a measure called Kullback- 
Leibler divergence [19], [20]. If X and Y are statistically 
independent, then 

p(x,y) =p(x)p(y). (4) 

In this case, the mutual information becomes 0, showing 
independency. A proper mapping of the form 

S(X, Y) = y/l - e- 2/ (^- y ) (5) 

normalizes the measure of general correlation as depicted 
by the MI [21], [22], [23]. In the case when X and Y are 
normally distributed, 

(X,Y) ~AT(ijl,K) (6) 

where, K = (a 2 , per 2 ; per 2 , a 2 ). The mutual information 
reduces to 

I(X,Y) = — g:log(l — p 2 ). (7) 

So, that, 

S(X,Y) = Vl-e-2^^) = | P (X,Y)\. (8) 

This relation shows the generality of the normalized corre- 
lation measure. 
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